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Fifty years have gone by since the publication of the first paper on clustering based on 
fuzzy sets theory. In 1965, L.A. Zadeh had published “Fuzzy Sets” [335]. After only one 
year, the first effects of this seminal paper began to emerge, with the pioneering paper on 
clustering by Bellman, Kalaba, Zadeh [33], in which they proposed a prototypal of cluster- 
ing algorithm based on the fuzzy sets theory. Starting from this paper, several uncertain 
clustering methods based on different theoretical approaches for modeling the uncertainty 
have been proposed. The present paper presents a systematic literature review of these 
clustering approaches. In particular, with respect to the Statistical Reasoning System, we 
first illustrate the connection between Information and Uncertainty from the perspective 
of the so-called Informational Paradigm, according to which Information is constituted by 
“Informational ingredients”, specifically the “Empirical Information,” represented by sta- 
tistical data, and “Theoretical information” consisting of background knowledge and basic 
modeling assumptions. We then describe different kinds of uncertainty affecting the Infor- 
mation. Focusing on the uncertainty associated with a particular statistical methodology, 
i.e. Cluster Analysis, and adopting as theoretical platform the Informational Paradigm, we 
present a systematic literature review of different uncertainty-based clustering approaches 
-i.e. Fuzzy clustering, Possibilistic clustering, Shadowed clustering, Rough sets-based clus- 
tering, Intuitionistic fuzzy clustering, Evidential clustering, Credibilistic clustering, Type-2 
fuzzy clustering, Neutrosophic clustering, Hesitant fuzzy clustering, Interval-based fuzzy 
clustering, and Picture fuzzy clustering. We thus show how all these clustering approaches 
are able of managing in different ways the uncertainty associated with the two compo- 
nents of the Informational Paradigm, i.e. the Empirical and Theoretical Information. 

© 2017 Elsevier Inc. All rights reserved. 
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1. Introduction 


Statistical reasoning can be viewed as a specific instance of approximate reasoning, where uncertainty affects the various 
ingredients of the reasoning process, which is therefore characterized by “approximation.” 

In particular, Statistical Reasoning Systems embody two types of “informational” ingredients, the Empirical Information 
represented by the dataset, and the initial Theoretical Information, which includes basic modeling assumptions, previous 
knowledge, and other pieces of Theoretical Information concerning the processing assumptions and the cognitive conclu- 
sions of the knowledge acquisition process (the “informational gain” obtained by means of appropriate strategies of analysis 
applied in the above context). 

All of the above informational ingredients are affected by some source of uncertainty. For example, the data may be 
imprecisely measured or vaguely defined (e.g. use of linguistic expressions); furthermore, they may only partially represent 
the universe of possible data describing the investigated phenomenon (for instance when they are sampled from a larger 
population). Moreover, the basic modeling assumptions may also be uncertain, and the same is true of the assumptions used 
for processing the data (in fact any particular specification of these assumptions involves uncertainty as to their validity 
in the given research framework). Finally, the results of the statistical analysis reflect the uncertainties associated with 
the various pieces of information used for drawing the conclusions. In this respect, we have to cope with an uncertainty 
propagation process matching a parallel information propagation process, within the same Statistical Reasoning System. 

In the above framework, randomness, imprecision, vagueness, partial ignorance are different types of uncertainty requir- 
ing a specific treatment. Standard probability theory may not be sufficient for dealing with all of them. We argue that fuzzy 
sets theory, as well as other uncertainty theories -such as, e.g., Type-2 fuzzy sets theory, Intuitionistic fuzzy sets theory, 
Rough sets theory, Shadowed sets theory, Credal sets theory and Evidential theory, Possibility and Credibilistic theories, 
Neutrosophic sets theory, Hesitant sets theory or inferential logic based on conditional probability (seen as a function of 
the conditioning event)- can suitably integrate the traditional probability theory in order to deal with the complexity of 
statistical reasoning. 

In this connection, the paper will focus on the specific area of Cluster Analysis to illustrate the different theoretical 
approaches used in the literature to manage the uncertainty in the clustering process. 

Fifty years have gone by since the publication of the first paper on clustering, based on fuzzy sets theory. In 1965, L.A. 
Zadeh had published “Fuzzy Sets” [332]. After only one year, the first effects of this seminal paper began to emerge, with the 
pioneering paper on clustering by Bellman, Kalaba, Zadeh [33], in which they proposed a prototypal of clustering algorithm 
based on the fuzzy sets theory. Starting from this paper, several uncertain clustering methods based on different theoretical 
approaches for modeling the uncertainty have been proposed. 

The present paper presents a systematic literature review of these clustering approaches. 

In particular, the main aim of the paper is to show in an organic manner the impressive impact of the seminal papers 
on fuzzy clustering on different scientific communities- mathematicians, statisticians, computer scientists, and so on- in the 
last 50 years. In fact, as we can see below, a massive and diversified scientific production has characterized those fruitful 
years. To do this, we define a general theoretical platform, i.e. the so-called Informational Paradigm, to manage different 
kinds of information and uncertainty, organically interconnected, which characterize the Statistical reasoning methods and 
in particular the clustering processes. Thus, we analyze systematically and in detail the Informational Paradigm, the possible 
uncertainty affecting different kinds of information and the connected theoretical formalisms for managing in a different 
manner the uncertainty, focusing on the fuzzy set theory and on its more fruitful theoretical extensions and generalizations 
in a methodological point of view. Successively, we adopt the Informational Paradigm as theoretical platform for the clus- 
tering methodology, showing different approaches for managing the uncertainty in the classification process. In this way, 
we assume that the different uncertainty-based clustering approaches are defined on the basis of the Information Paradigm. 
Thus, in different sections of the paper, we review systematically and in detail the more relevant uncertainty-based clus- 
tering approaches proposed in the literature for classifying objects and explain the respective theories used for managing 
the uncertainty. For each clustering approach, we illustrate the chronology of the various theoretical and methodological 
contributes, showing, with respect to the Informational Paradigm, the information ingredients and the uncertainty mea- 
sures connected to different clustering approaches. Furthermore, we compare in a chronological point of view the different 
uncertainty-based clustering approaches and the connected uncertainty theories, showing the different timing of the im- 
pacts of the various uncertainty theories from the respective clustering approaches and then the different metabolic process 
of the theoretical results used in the respective clustering methodologies. 

The paper is organized as follows. Starting from the definition of Informational Paradigm (Section 2), we illustrate vari- 
ous non probabilistic formalisms for managing uncertainty in data analysis (Section 3), including fuzzy sets theory and its 
developments, and theories that manage imprecision and uncertainty in a different way. Focusing on the fuzzy sets theory 
and on some of its recent developments -i.e. the Type-2 fuzzy sets theory, Intuitionistic fuzzy sets theory, Rough sets the- 
ory, Shadowed sets theory, Credal sets theory and Evidential theory, Possibility and Credibilistic theories, Neutrosophic sets 
theory, Hesitant sets theory and Picture Fuzzy Sets- in Section 4, we present a review of clustering methods based on the 
various formalisms present in the literature. As we shall see, these formalisms manage in different ways the uncertainty as- 
sociated with the two components of the Informational Paradigm, i.e. the Empirical and Theoretical Information. A summary 
and some conclusions are presented, respectively in Sections 5 and 6. 
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2. Informational Paradigm and uncertainty 


Following the general approach to Statistical Reasoning called “Informational Paradigm” [61,62] the Information (I) is 
made up of “Informational Ingredients”, which include the “Empirical Information” (Ig), i.e. the statistical data, and by “The- 
oretical information” (Ir) i.e. the background knowledge and basic modeling assumptions. The cognitive conclusions of the 
process constitute the “Informational Gain” (Iç) of the Statistical Reasoning System. 

All of these informational ingredients can be affected by various kinds of Uncertainty (U). A reasonable list is the follow- 
ing: 


¢ The uncertainty of the relation between the observed data and the “universe” of possible data (in the traditional infer- 
ential statistical approach this is managed by means of a probabilistic sampling model). 

The uncertainty associated with the different components of a statistical method: algorithm, procedure, model and so 
on. 

The margin of error in the measurements of the empirical phenomena. 

The vagueness of the linguistic terms describing the phenomena. 

The partial or total ignorance of a phenomenon in specific observational instances (e.g. missing data), or of the underlying 
theoretical assumptions. 

The uncertainty deriving from the “granularity” of the terms used in the description of the physical world (this is related 
to the general notion of “linguistic variable” whose “granules” are “clumps of values drawn together by indistinguisha- 
bility, similarity, proximity or functionality”). Note that this notion is more general than that of qualitative variable used 
in Statistics. Specifically, it refers to general cognitive processes, including the definition and use of models, theoretical 
assumptions, etc., and therefore, also to what we called theoretical information in statistical reasoning. 


If we view Statistical Reasoning from the perspective of the Informational Paradigm, we can distinguish two parallel pro- 
cesses: on one side the manipulation of the Informational ingredients (from the Initial Information (I), through the Processing 
Assumptions and the final Informational Gain); on the other side the propagation of Uncertainty (from the initial Uncer- 
tainties (U), through the Uncertainties associated with the various Informational ingredients introduced into the Statistical 
Reasoning process, i.e. Ug (the Uncertainty associated with Ig) and Ur (the Uncertainty associated with Ir), up to the final 
assessment of Uncertainty related to the Additional Information (I,) obtained by the given Statistical Reasoning System, i.e. 
Final Uncertainty (Up), associated with the Final Information (Ip) obtained manipulating and combining suitably the two in- 
formational entities of the Informational Paradigm). Note that, by manipulating/combining the informational ingredients, we 
intrinsically manage the uncertainty obtaining suitable uncertainty measures capable of improving and increasing informa- 
tion. Therefore, Information and Uncertainty appear to be strictly linked with one another, i.e. information and uncertainty 
are organically interconnected, so that we can speak of “Information-based Uncertainty” as well as of “Uncertainty based 
Information” [147]. For more details, see Coppi [61,62] and Coppi et al. [64]. 

In Fig. 1, we schematically present these two processes. 

The different kinds of uncertainty associated with the two types of information can be modeled following different 
theoretical formalisms. In the following section, we will consider the formalization of the uncertainty in a non-probabilistic 
manner, by considering the fuzzy sets theory introduced by Lofti A. Zadeh in 1965 [332]. In addition, in Section 3, we will 
illustrate different developments of Zadeh’s theory. 


3. Management of uncertainty in the Informational Paradigm: fuzzy sets theory and other theoretical formalisms 


In the scientific literature of the twentieth-century, a few theoretical frameworks for developing Statistical Reasoning 
have been suggested -from the classical Inferential Paradigm (e.g, [67]) to the Descriptive-Exploratory Paradigm of the 
French school usually referred to as “Analyze des Données” (e.g., [34]), to the more recent Statistical Learning Paradigm 
(e.g., Vapnik, [295]). However, all of the above frameworks do not allow for a complete treatment of the various sources 
of Uncertainty affecting the Statistical Reasoning process. In fact, the main source of Uncertainty investigated in the above 
mentioned theoretical frameworks is “randomness,” quite often limited to the “data generation process” managed by means 
of appropriate probabilistic models [62]. In this area, Zadeh’s fundamental contribution (1965), by introducing the notion 
of Fuzzy Sets, opened the way to a new development of logical, mathematical and statistical thinking. In close connection 
with Probability Theory, Fuzzy Sets Theory may provide the necessary tools for a generalized treatment of Uncertainty in 
Statistics [62] and powerful tools for expressing and managing some of the abovementioned uncertainties and the associated 
informational entities beyond the traditional domains of probability theory in its standard form [64]. 


3.1. Fuzzy sets theory and other theoretical formalisms 


In the classical Set Theory introduced by G. Cantor in 1874, the membership of elements in a set is assessed in binary 
terms according to a Principle of Bivalence: an element either belongs or does not belong to the set. By contrast, in the 
fuzzy set theory proposed by L.A. Zadeh [332], the membership of elements in a set is assessed in gradual terms, and 
can be described with the aid of a Membership function valued in the real unit interval [0,1]. Thus, fuzzy sets represent a 
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Fig. 1. Scheme of the Informational Paradigm and uncertainty. 


generalization of classical sets, since the Indicator function of classical sets are special cases of the membership functions 
of fuzzy sets, in which membership can only take the values 0 or 1. In fuzzy set theory, classical bivalent sets are usually 
called "Crisp sets." 

In the literature on the management of uncertainty from a non-exclusively probabilistic perspective, fuzzy sets theory 
represents a crucial innovation. Indeed, starting from this theory a great number of theoretical formalisms for managing in 
different ways the various kinds of uncertainty affecting the two informational entities of the Informational Paradigm (i.e. 
the empirical and theoretical information) have been suggested, and many new mathematical constructions and theories 
treating imprecision, inexactness, ambiguity, vagueness, and uncertainty have been developed. Some of these constructions 
and theories are extensions of fuzzy set theory, while others try to mathematically model imprecision and uncertainty in 
different ways (Burgin Chnihin [361] [99,144]). Note that while most of the above can be generally categorized as truth- 
based extensions of fuzzy sets, bipolar fuzzy set theory presents a philosophically and logically different, equilibrium-based 
generalization of fuzzy sets [339-341]. 

The most relevant theoretical formalisms are listed in Table 1 and in the timeline illustrated in Fig. 2 (in the list, we also 
include sets and fuzzy sets theories). 

Some of the formalisms shown in Table 1 have been utilized for managing uncertainty in specific domains of Statistics 
and in particular in Cluster Analysis. In this connection, in the following section, we will illustrate a detailed review of the 
clustering approaches used to appropriately deal with different kinds of uncertainty in clustering processes. 


4. Informational Paradigm and different uncertainty formalisms in the clustering approach 


The Informational Paradigm can be applied to all forms of Statistical Reasoning; in particular, we focus our attention on 
cluster analysis. 

For example, in a non-hierarchical clustering framework, the general optimization problem associated to the clustering 
process can be formalized as follows: 


Optimization (with respect to u and h) of the objective function : g(x; u, h; c, m, d) 
subject to possible constraints 
where: 


g=type of objective function, h= kind of cluster prototype, u= uncertainty measure, x= input data, c= number of clus- 
ters, m= parameter for tuning suitably the uncertainty, d= distance measure. 
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Table 1 
Theoretical formalisms for managing the uncertainty (*). 
1. Sets [45] 25. bipolar fuzzy sets [340] 
2. fuzzy sets [332] 26. shadowed sets [225] 
3. interval sets [211] 27. neutrosophic sets [274] 
4. L-fuzzy sets [112] 28. genuine sets [89] 
5. flou sets [110] 29. soft sets [210] 
6. Boolean-valued fuzzy sets [38] 30. intuitionistic fuzzy rough sets [65,66] 
7. set-valued sets [50,51] 31. blurry sets [276] 
8. type-2 fuzzy sets and type-n fuzzy sets [353] 32. L-fuzzy rough sets [249] 
9. interval-valued fuzzy sets [115,132,263,353] 33. hesitant fuzzy sets [290] 
10. functions as generalizations of fuzzy sets and multisets [159] 34. generalized rough fuzzy sets [106] 
11. level fuzzy sets [248] 35. rough intuitionistic fuzzy sets [283] 
12. underdetermined sets [213] 36. soft rough fuzzy sets [196] 
13. rough sets [221] 37. soft fuzzy rough sets [196] 
14. intuitionistic fuzzy sets [25] 38. soft multisets [7] 
15. fuzzy multisets [354] 39. fuzzy soft multisets [8] 
16. intuitionistic L-fuzzy sets [25] 40. picture fuzzy sets [73] 


17. rough multisets [117] 

18. fuzzy rough sets [212] 

19. real-valued fuzzy sets [37] 
20. named sets [39] 

21. vague sets [109] 

22. Q-sets [122] 

23. a-level sets [325,326] 

24. credal sets [68,69] 





©) Note that, for completeness in the list we consider the sets theory; however, it represents the logic- 
mathematical construction basis for modeling the uncertainty in a probabilistic perspective by means of the prob- 
abilistic theory. 


Thus, from an informational perspective, the informational ingredients J; and Ir of the Informational I and the Final 
Information are: 


I = {Ig = (x), Ir = (optimization criterion, g(.),c,m,d, type ofprototype)} 
Ir = {results : u, h}. 


The initial uncertainty (U) associated to I concerns, respectively, the nature of the input data x (Ug) and the theoretical 
ingredients of the clustering method (i.e. the optimization criterion, g(.), c, m, d and the type of prototype) (Ur); the final 
uncertainty is related to the computed results and their interpretation (Up). 

Note that, by manipulating and combining the informational ingredients, we intrinsically manage the uncertainty, ob- 
taining suitable uncertainty measures capable of improving and increasing the informational power associated with the 
clustering process. The uncertainty measurements are represented directly by u and/or their elaborations. 

Note that, since according to the Informational Paradigm the final information is also affected by uncertainty, the com- 
puted point uncertainty measurement u (a piece of the final information) is itself affected by uncertainty; therefore, in the 
knowledge discovery process, the Statistical Reasoning Systems is not able to completely eliminate uncertainty. No matter 
how efficient the system is, a halo of uncertainty always permeates the information. 

From this perspective, in Cluster Analysis there are different theoretical approaches for properly managing the different 
kinds of uncertainty affecting the Empirical and Theoretical Information of the Information Paradigm, i.e., respectively, the 
data and the various theoretical ingredients of the clustering methods. 

In terms of managing uncertainty through a statistical and probabilistic approach, there are several methods based on 
the use of finite mixture models for clustering of multivariate data observed from a random sample. Such models can be 
fitted by maximum likelihood via the Expected-Maximization (EM) algorithm (see, among others, [193]). 

The most used non-probabilistic approach for managing different kinds of uncertainty in the clustering framework is the 
fuzzy sets theory [332]. The literature on fuzzy clustering is boundless. As we can see in Section 4.1, in the early methods 
(see, e.g., Bezdek, [355]), uncertainty is modeled exclusively with respect to the theoretical component of the Informational 
Paradigm, by means of the concept of membership degree, represented by an uncertainty measurement in the assignment 
process of objects to different clusters. 

In the literature, there are also several methods in which uncertainty is modeled by considering simultaneously the 
random (stochastic) and fuzzy approach, by means of the theory of Fuzzy Random Variables (FRVs). From a theoretical 
viewpoint, this implies constructing appropriate probability measurements of fuzzy sets. Various proposals have been made, 
in the last decades, within this perspective. One of the most significant proposals is the notion of FRV as introduced by Puri 
and Ralescu [245] and Klement et al. [146]. Another definition for a fuzzy-valued random variable which is mathematically 
equivalent, albeit conceptually different and can only be stated for the univariate case, is the one proposed by Kwakernaak 
[154,157] and Kruse and Meyer [153]. For an example of clustering based on FRVs see, Colubi et al. [59]. 

In the following sections, we will focus our attention on the clustering approaches based on the fuzzy sets theory and 
on different extensions and derivatives of this prolific theory. 
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In particular, we present a literature review of the following uncertainty-based clustering approaches (ordered with re- 
spect to the first published paper of each approach): 


- Fuzzy clustering, 

- Possibilistic clustering, 

- Shadowed clustering, 

- Rough sets-based clustering, 

- Intuitionistic fuzzy clustering, 
- Evidential clustering or credal clustering or Belief clustering, 
- Credibilistic clustering, 

- Type-2 fuzzy clustering, 

- Neutrosophic clustering, 

- Hesitant fuzzy clustering, 

- Interval-based fuzzy clustering, 
- Picture fuzzy clustering. 


4.1. Fuzzy clustering 


Fuzzy Sets (FSs), as already mentioned, is a powerful theory introduced by Zadeh [332] more than 50 years ago for 
suitably managing uncertainty and imprecision in knowledge discovery processes. Mathematically, a Fuzzy Set (FS) (or Type- 
1 Fuzzy Set, T1FS) can be defined as follows. 

Let X be an initial universe. A Fuzzy Set (FS) (or T1FS) A c X is defined as follows: 


A= {(x, Ha(X) : Xe X} 


where ua(x) is called “membership function” of x to A with a(x)e [0,1]. 

The definition of a FS is a simple extension of the definition of a classical set in which the membership function j1q(x) 
is restricted to either 0 or 1, i.e. ua(x)e {0,1}. 

Fuzzy Sets theory immediately showed its theoretical and applicative potential in the cluster analysis in which it was 
widely used. 

Traditional clustering methods mandate that an object must belong to precisely one cluster. Such a requirement is found 
to be too restrictive in many applications. In practice, an object may display characteristics of different clusters. In such 
cases, an object should belong to more than one cluster and as a result, cluster boundaries necessarily overlap. The fuzzy 
set representation of clusters makes it possible for an object to belong to multiple clusters with a membership degree 
between 0 and 1 [169]. 

In Statistical Reasoning, clustering constitutes the first statistical technique which lends itself to a treatment based on 
the fuzzy sets theory [332]. The rationale for it lies in the recognition of the vague nature of the cluster assignment task. In 
the literature, several fuzzy clustering methods have been proposed and applied in many different fields [78]. 

As already mentioned, the pioneering fuzzy clustering algorithm was proposed in 1966 by Bellman, Kalaba, Zadeh [33]. 
Successively, other authors -i.e. Wee [306], Flake and Turner (1968), Gitman and Levine [111] and Ruspini [260-262]- con- 
tributed to the original clustering algorithm based on fuzzy sets theory. As remarked by Bezdek in his first paper on fuzzy 
clustering, the previous papers “trace the evolution of fuzzy sets as a theoretical basis for cluster analysis” [35]. With regard 
to Ruspini’s approach, however, “the original algorithm [...] is said to be rather difficult to implement. Its computational 
efficiency should be weak and its generalization to more than two clusters should be of little success. But it was the pio- 
neer for a successful development of this approach” [28]. Ruspini’s method opened the door for further research, especially 
thanks to the idea of fuzzy c-partitions in cluster analysis [357]. 

The Fuzzy c-Means (FcM) clustering method independently introduced by Dunn [356] and Bezdek [35] and then ana- 
lytically formalized and investigated from a more general perspective by Bezdek [355], is the first method which is com- 
putationally efficient and powerful. For this reason, the Bezdek’s method represents the best-known and used clustering 
technique in the body of literature. Several fuzzy approach-based clustering methods have been developed by extending 
suitably the original Bezdek’s method proposed in the 1981 [78]. 

Let X = {xi :i=1,...,m;5=1,..., P} = {Xj = (Xi, -o Xis» -> Xip) 11 = 1,... n} be the data matrix, where xis represents the 
sth variable observed on the ith object and x; represents the vector of the ith observation. 

The FcM clustering method proposed by Bezdek [355] can be formalized in the following way (for the iterative solutions 
see Bezdek, [355]): 


n Cc 
min: 5} >) umd? (x;, hy) 
E i=1 k=1 (1 ) 
D Uir = 1, Uig > 0 
k=1 
where uj, indicates the membership degree of the ith object to the kth cluster; d?(x;, hy) = ||x; — h; ||? represents the 
squared Euclidean distance between the ith object and the centroid of the kth cluster, ie. hy = (hy. ..., Ags, ---» Agp)” repre- 
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sents the kth centroid, where hg; indicates the sth component (sth variable) of the kth centroid vector; m>1 is a parameter 
that controls the fuzziness of the partition. 

With respect to the Informational Paradigm (J) and the informational ingredients of the FcM clustering method (1), we 
have that the Empirical Information (Iç) is represented by x; (i=1,...n) (the elements of the data matrix) and the Theoretical 
Information (Ir) is represented by theoretical informational ingredients of the clustering method, i.e., minimization criterion, 
additive quadratic objective function, c (number of clusters) and m (fuzziness parameter), d2(.) (squared Euclidean distance 
measure), type of prototype (centroid); the final information is constituted by uig (i=1,...n; k=1,...,c) (the membership 
degrees of the object to the clusters), hẹ (k=1,...,c) (the centroids of the clusters). In symbol, we have: 


I = {Ig=(x;), Ir = (minimization criterion, additive quadratic objective function, c, m, squared Euclidean distance measure, 
type of prototype): i=1,...,n} 
Ip = {results: uj, hy: i=1,...n; k=1,...,c}. 


Thus, uncertainty can affect all the previous components of the Informational Paradigm. 

By considering the clustering version proposed by Bezdek [355] the uncertainty is measured directly by u, (i=1,...n; 
k=1,...,¢), i.e. we consider only the uncertainty associated with the assignment process of the objects to the clusters; no 
other kinds of uncertainty are taken into account in the original FcM clustering method. In particular, u;, (i=1,...n; k=1,...,c) 
represents a point measure of the uncertainty in the sense that for each membership degree we have a single value repre- 
senting the measurement of the uncertainty in the assignment process of each object to each cluster. Derivative uncertainty 
measurements can be obtained from u;, defining suitable within and between variability measures of the clusters of the 
computed partition. 

In the literature, there are several extensions of (1) in which the Empirical Information (represented by the input data) 
is affect by uncertainty, imprecision, vagueness and so on. 

In particular, by considering the case in which the imprecision of the data are formalized in a fuzzy manner (fuzzy 
data) we point out the following contributes on fuzzy clustering of fuzzy data: Sato and Sato [266], Hathaway et al. [124], 
Yang and Ko [320], Yang and Liu [321], Takata et al. [282], Auephanwiriyakul and Keller [26], Butkiewicz [41], Hung and 
Yang [128], D'Urso and Giordani [80], Zarandi and Razaee [334], Coppi et al. [63]. Suggestive applications of fuzzy clustering 
methods for fuzzy data in e-health and tourism have been suggested, respectively, by D’Urso et al. [74] and D'Urso et al. 
[76,78]. Recently, robust fuzzy clustering methods for fuzzy data have been suggested by D'Urso and De Giovanni [77]. They, 
using a “Partitioning Around Medoids” (PAM) approach, firstly proposed a timid robustification of the fuzzy clustering for 
LR fuzzy data; successively, proposed three robust fuzzy clustering models based, respectively, on noise cluster, exponential 
metric and trimming rules. For a survey on fuzzy clustering of fuzzy data, see D'Urso [75]. 

In the literature, there are also various clustering methods for classifying imprecise data modeled as interval-valued data. 
See, e.g., El-Sonbaty and Ismail (1998), D'Urso and Giordani [80,81], de Carvalho [86], de Carvalho and Tenorio [85], D'Urso 
et al. [105], D'Urso and Leski [82]. 

An interesting approach for managing the imprecision affecting the data is the granular computing [29,228,233] As 
remarked by Zadeh [331] wrote “fuzzy information granulation underlies the remarkable human ability to make ratio- 
nal decisions in an environment of imprecision, partial knowledge, partial certainty and partial truth” . In this respect, 
in a clustering framework, in terms of granular computing, a cluster can be interpreted as an information granule that 
presents its objects on a coarser and more granular level [107]. Useful references on clustering of granular data are, e.g., 
[104,107,168,231,232,241,265]. 

In addition, in the literature there are many methods based on extensions or alternatives of the objective function repre- 
sented in (1) (e.g. methods with entropy regularization, relational methods, hybrid methods, robust methods and so on). In 
all these cases, the new additional parameters considered in the respective objective functions represent other ingredients 
of the Theoretical Information, themselves affected by uncertainty. 

For interested readers, useful references to the extensive literature on fuzzy clustering are found in the chapter on the 
fuzzy clustering by D'Urso in the Handbook of Cluster Analysis |42], the seminal monograph by Bezdek [355], the books by 
Jain, Dubes [133], de Oliveira, Pedrycz [87], Miyamoto, Ichihashi, Honda [207] and, e.g., the following journals devoted to 
this topic: Fuzzy Sets and Systems, IEEE Transactions on Fuzzy Systems, Information Sciences, Pattern Recognition, Applied Soft 
Computing, Soft Computing, Pattern Recognition Letters. 


4.2. Possibilistic clustering 


In FcM (see (1)), for each object, the sum of the membership degrees in the clusters must be equal to one. Such a 
constraint may cause meaningless results, especially when noise is present. Following the Possibility Theory [333], to avoid 
this drawback, it is possible to relax the constraint, leading to the so-called Possibilistic c-Means (PcM) clustering, providing 
“degrees of compatibility” of an object with each of the clusters [78]. Note that, in general the Possibility Theory is related 
to the Fuzzy Sets Theory by defining the concept of a possibility distribution as a fuzzy restriction which acts as an elastic 
constraint on the value that may be assigned to a variable [333]. A revised fuzzy set interpretation of possibility theory was 
introduced by Klir [148]. 

In the possibilistic perspective, u;, represents the degree of possibility of object i belonging to cluster k or, in other 
terms, the degree of compatibility of the profile x; with the characteristics of cluster k embodied by its prototype h,. The 
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Fig. 3. Three zones in shadowed set divided by the threshold À (source: [337]). 


FcM objective function is consequently modified, by introducing an additive “penalization” term which takes care of the 
balance between the fuzziness of the clustering structure and the “compactness” of the clusters [78]. 

The first PcM clustering method was proposed by Krishnapuram and Keller [151]. These authors modify the object func- 
tion in (1) - additively considering a second term to uj, to be as large as possible to avoid trivial solutions- and relax the 
constraints considering the conditions of a possibilistic partition, ie. p_, Uj, > 0, O < Uj < 1. 

Barni et al., [31] highlighted some limitations in the use of the PkM clustering method proposed by Krishnapuram and 
Keller [152], in that these may lead to trivial solutions consisting of “coincident clusters” . This tendency may be dealt 
with by means of an appropriate initialization of the computational procedure (see [63]). In attempt to solve this potential 
problem, Krishnapuram and Keller [152] suggest an alternative objective function of the previous method. 

A possibilistic clustering method based on a robust approach using Vapnik’s € -intensive estimator [295] has been pro- 
posed by Leski [160]. 

A modification of the objective function of the first method proposed by Krishnapuram and Keller [151] has been sug- 
gested also by Yang and Wu [322]. Other prototype-based possibilistic clustering methods have been developed by Barni 
and Gualtieri [30], Ménard et al. [194] and Xie et al. [358]). Recently, possibilistic clustering methods have been proposed by 
Amirkhani et al. [10], Ammar et al. [11], Chang et al. [49], Maciel et al. [181], Xenaki et al. [311] and Wang and Hung [303]. 

Following a relational approach, possibilistic clustering methods have been proposed by Krishnapuram et al. [150] and 
De Caceres et al. [84]. 

By considering a hybrid approach, in which membership and typicality degrees are simultaneously generated, a fuzzy 
possibilistic clustering method has been proposed by Pal et al. [218]. Recently, a fuzzy-possibilistic clustering based on 
Bayesian inference has been proposed by Abadpour [1]. Successively, Abadpour [2] suggested a Bayesian inference-based 
fuzzy possibilistic clustering in a spatial context. Other hybrid methods have been suggested by Zhang and Leung [336] and 
Wu et al. [310], Aparajeeta et al. [16], Askari et al. [20] and Kannan et al. [140]. 

In all the mentioned possibilistic clustering methods the uncertainty is measured, in a point manner, by means of the 
degree of possibility (or degree of compatibility); for the hybrid methods, the uncertainty measures are, simultaneously, the 
membership and typicality degrees . 

Note that, also in the possibilistic domain there are clustering methods in which the empirical information is affected by 
uncertainty, i.e. the data are imprecise. An example of possibilistic method in which the imprecise data are suitably managed 
has been proposed by Coppi et al. [63]. 


4.3. Shadowed clustering 


Shadowed clustering is based on the concept of shadowed sets [225,229], which are defined over fuzzy sets. The main 
difference is that the shadowed set divides the precise numeric (point) membership values into three zones: exclusion, 
shadowed, and core zones (see Fig. 3). They are determined by thresholds 4; € [0;0.5). In particular, the values of the 
membership degrees are modified as follows (i=1..., n; k=1....,c): 


1 Up= 1—Ax 
Up = {Ug Ak < Uy <1—Ax 
0 Uy, < Ax. 
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In other words, the shadowed set is a modified fuzzy set. The three zones are obtained by enhancing the membership 
values to 1 in core zone, reducing the membership values to O in exclusion zone, and keeping the membership values 
unchanged in the shadow zone. 

At the first sight shadowed sets seem to be identical to rough sets (see Section 4.4). As remarked Peters et al. [238], this 
is only true with respect to the categorization of the objects to three classes: (1) sure members, (2) sure non-members, and 
(3) objects in-between. A closer look reveals distinct differences between shadowed and rough sets, i.e. Peters et al. [238]: 


- in contrast to rough sets the equivalence classes are defined dynamically in shadowed sets, i.e. the respective thresholds 
are derived from the data sets, whereas in rough sets the thresholds are determined by a user; 

- while in rough sets that an object must belong to a least two shadowed regions (rough sets: upper approximations) if it 
is not a member of any core (rough sets: lower approximation) this is not true for shadowed sets. 


With respect to contributes in cluster analysis, in the first paper on shadowed sets, Pedrycz [225] introduced the new 
concept of shadowed sets and its use in a clustering framework. Successively interesting developments concerning the uti- 
lization of shadowed sets in cluster analysis have been proposed by Pedrycz [226], in which the author suggests an inter- 
pretation of clusters in the framework of shadowed sets, and Mitra et al. [204], in which a shadowed c-means clustering 
is suggested integrating fuzzy and rough clustering. In particular, in the proposed method, “the elements corresponding to 
the shadowed region lie in the zone of uncertainty, and are treated as in FCM. However, the members of the exclusion re- 
gion are incorporated in a slightly different manner. Here the fuzzy weight factor for the exclusion is designed to have the 
fuzzifier raised to itself, in the form of a double exponential” [204]. By taking into account these aspects, the centroids are 
then suitably determined. “This arrangement causes a much wider dispersion and a very low bias factor for elements which 
can generally be considered outside the class under discussion or most definitely, the exclusion members. This prevents the 
mean from getting drifted from its true value. It also minimizes the effect of noise and outliers. The threshold to induce the 
core, shadowed and exclusion region is automatically calculated through a functional optimization [...]. The mean [of the 
mathematical formula of the centroids] basically tries to first get a coarse idea regarding the cluster prototype [...] and then 
proceeds to tune and refine this value using data from the shadowed and exclusion region. This enables a better estimation 
of the actual cluster prototypes” [204]. For more details, see Mitra et al. [204]. 

Zhou et al. [135] proposed a shadowed set-based rough-fuzzy clustering. Mitra and Kundu [202] used the shadowed 
c-means clustering for satellite image segmentation. Wang and Wang [302] suggested a feature weighting fuzzy clustering 
integrating rough sets and shadowed sets. Zhang et al. [337] proposed a clustering method based on spatial shadowed fuzzy 
c-means clustering and I-Ching Operators. See, also Mitra et al. [205], Pedrycz [227], Peters et al. [238], Shi et al. [271] and 
Zhang et al. [337]. 

Within an Informational-Uncertainty perspective, the shadowed sets-based clustering methods are able to manage the 
uncertainty related to interpretation of the results. In particular, the shadowed sets are used to interpret the fuzzy cluster 
partition and distinguish between cluster cores and cluster shadows with ambiguously assigned data point. 


4.4. Rough set-based clustering 


In Section 4.1 we have illustrated the fuzzy approach to cluster analysis. Note that, in some cases, the fuzzy degree of 
membership may be too descriptive for interpreting clustering results [169]. Rough set-based clustering provides a solution 
that is less restrictive than standard clustering and less descriptive (specific) than fuzzy clustering [169]. 

Rough set theory has made substantial progress as a classification tool in Statistical Reasoning. The basic concept of 
representing a set as lower and upper approximations can be used in a broader context such as clustering. Clustering in 
relation to rough set theory is attracting increasing interest among researchers [169]. 

The notion of rough sets was proposed by Polish computer scientist Pawlak [221,222]. More formal properties and bound- 
aries of rough sets can be found in Pawlak [222] and subsequent works. The initial and basic theory of rough sets is some- 
times referred to as "Pawlak Rough Sets" or "classical rough sets", as a means to distinguish from more recent extensions and 
generalizations. 

The theory of rough sets has emerged as another mathematical tool to manage uncertainty that arises from granularity 
in the domain of discourse—-that is, from the indiscernibility between objects in a set. “The intention is to approximate a 
rough (imprecise) concept in the domain of discourse by a pair of exact concepts, called the lower and upper approximations. 
These exact concepts are determined by an indiscernibility relation on the domain, which, in turn, may be induced by a given 
set of attributes ascribed to the objects of the domain. The lower approximation is the set of objects definitely belonging to 
the vague concept, whereas the upper approximation is the set of objects possibly belonging to the same” [199]. A scheme 
of a rough set is illustrated in Fig. 4. 

Since the development of rough sets, extensions and generalizations have continued to evolve. Initial developments fo- 
cused on similarities and differences with fuzzy sets. While some literature argues that these concepts are different, other 
literature considers rough sets as a generalization of fuzzy sets - as represented through either fuzzy rough sets or rough 
fuzzy sets. Pawlak et al. [223] maintain that fuzzy and rough sets should be treated as being complementary to each other, 
addressing different aspects of uncertainty and vagueness. 

In the literature, several generalizations of rough sets have been introduced, e.g. some of these generalizations are: 
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Fig. 4. Rough set approximations (i.e. lower and upper approximations) (source: [169]). 


e rough multisets [117]; 

¢ fuzzy rough sets extend the rough set concept through the use of fuzzy equivalence classes [212]; 

e alpha rough set theory (@-RST) - a generalization of rough set theory that allows approximation using of fuzzy concepts 
[246]; 

e intuitionistic fuzzy rough sets [65]; 

e generalized rough fuzzy sets [106]; 

e rough intuitionistic fuzzy sets [283]; 

e soft rough fuzzy sets and soft fuzzy rough sets [196]; 

* composite rough sets [335]. 


An interesting discussion on rough fuzzy sets and fuzzy rough sets is shown in Dubois and Prade [102]. 

With respect to specific statistical area of cluster analysis, in recent years, rough sets theory has been fruitfully utilized. 
The first contributes on rough set-based clustering have been suggested by Lingras and West [170,171], do Prado et al. 
[100] and Voges et al. [299,300]. 

In rough clustering each cluster has two approximations, a lower and an upper approximation. The lower approximation 
is a subset of the upper approximation. The members of the lower approximation belong certainly to the cluster, therefore 
they cannot belong to any other cluster. The data objects in an upper approximation may belong to the cluster. Since their 
membership is uncertain they must be a member of an upper approximation of at least another cluster [236]. In particular, 
Lingras and West [170,171] introduced a clustering method called rough c-means (RCM), which describes a cluster by a pro- 
totype (center) and a pair of lower and upper approximations. The lower and upper approximations are different weighted 
parameters that are used to compute the new centers. 

A refined rough c-means clustering has been proposed by Peters [236]. 

Note that there is a crucial difference with fuzzy set theory where we also have overlapping clusters: in fuzzy set theory, 
an object can belong to many sets; in rough sets the memberships to two or more sets indicate that there is insufficient 
information to determine the actual membership to one and only one cluster [236]. See, i.e., Dubois and Prade [102], for 
more details. 

In Fig. 5, we report some graphics drawn by Peters et al. [238], in which examples are illustrated of the differences 
between traditional c-means, fuzzy c-means and rough c-means in term of membership degrees. For more details, see Peters 
et al. [238]. 

In this connection, a combination of fuzzy and rough sets provides a promising approach, insofar as the two procedures 
complement each other to some extent. In fact, various clustering methods combining rough and fuzzy sets have been 
proposed in the last decade. 

In particular, we focus our attention on two rough-fuzzy clustering methods which, as remarked by Peters et al. [238], 
while named identically, integrate fuzzy concepts into the rough c-means in slightly differently ways: 


e Mitra et al.'s Rough-Fuzzy Clustering [206]: The authors proposed a hybrid rough-fuzzy clustering algorithm (RFCM) with 
fuzzy lower approximations and fuzzy boundaries. The RFCM mainly differs from the original rough clustering method 
(RCM) [238] in the following: 

(i) with respect to the assignment of the objects to a lower approximation or a boundary: Euclidean distances used in 
rough c-means to distinguish between the objects in the boundary and in the lower approximations are replaced 
with membership degrees obtained by fuzzy clustering; 

(ii) with respect to the calculation of the means: when calculating the means in rough clustering, the objects in the lower 
approximation as well as in the boundary are weighted based on their respective fuzzy membership degrees. 
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Fig. 5. Membership degrees for traditional c-means, fuzzy c-means and rough c-means in terms of membership degrees (source: [238]) (we consider the 
number of clusters of the original paper, i.e. two clusters; furthermore, we consider the original notation of the membership degree, i.e., A). In Figure, we 
show, respectively, the data pattern (grouping objects in clusters, the membership degrees for traditional, fuzzy and rough c-means clustering. 


e Maji and Pal’s Rough-Fuzzy Clustering [183]: the authors propose a variant of Mitra et al.’s [206] hybrid rough-fuzzy 
clustering algorithm. All objects in a lower approximation have the same influence on the determination of their means 
and are independent from other clusters. Hence, they propose a rough-fuzzy c-means with crisp lower approximations 
and fuzzy boundaries. Obviously, in Maji and Pal’s [184] method the lower approximation has a higher impact on clus- 
tering in comparison to Mitra et al.’s [206] approach where the objects in lower approximation are weighted by factors 
between 0 and 1 [238]. 


In addition, Maji and Pal [184] also proposed a rough set -based generalized fuzzy c-means algorithm which they called 
RFPCM (rough fuzzy possibilistic c-means). The algorithm merges fuzzy and possibilistic approaches and rough k-means in 
a way that the three clustering algorithms it includes can be derived from it. As in Maji and Pal’s [183] RFCM, a cluster 
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Fig. 6. Scheme of Mitra et al.'s [206], Maji and Pal’s [183] and Maji and Pal [184] methods (source: [238]). 


is approximated by a crisp lower approximation and a fuzzy boundary. Additionally, in their RFPCM, the boundary can be 
characterized by possibilistic elements. 

In Fig. 6, we show the original graphic drawn by Peters et al. [238], which summarizes Mitra et al.’s [206] and Maji 
and Pal’s [183] versions of the rough-fuzzy c-means clustering, and rough fuzzy possibilistic c-means clustering proposed by 
Maji and Pal [184]. 

In Mitra et al. [206], the centroids are calculated based on the weighting average of the fuzzy lower approximation and 
fuzzy boundary. In Maji and Pal [183] and Maji and Pal [184], the centroids are calculated based on the weighting average of 
the crisp lower approximation and fuzzy boundary. Computation of the centroids is modified to include the effects of both 
fuzzy memberships -i.e. fuzzy memberships for the first method, fuzzy and possibilistic memberships for the second one- 
and lower and upper bounds. 

Within the Informational Paradigm perspective, as in the case of the previous rough-fuzzy clustering methods, the un- 
certainty affecting the information is managed by means of the concepts of lower and upper approximations and fuzzy 
boundary. 

An interesting description of the basic concept of rough clustering based on k-means, genetic algorithms, Kohonen self- 
organizing maps, and support vector clustering is found in Lingras and Peters [169]. Their article also includes a review of 
rough cluster validity measures, and applications of rough clustering to such diverse substantive areas. 

Other clustering methods based on rough sets have been proposed recently by Cai and Verbeek [43], Hamidzadeh et al. 
{123], Shi et al. [271] and Pacheco et al. [217]. 

For a survey of rough clustering and its extensions and derivatives see also Peters et al. [238]. 

References relative to several contributes on rough clustering, extensions and derivatives suggested in the literature are 
listed chronologically in Table 2. 


4.5. Intuitionistic fuzzy clustering 


As mentioned above, since Zadeh introduced fuzzy sets in 1965, various approaches and theories treating imprecision 
and uncertainty have been proposed. In 1986, Atanassov [25] introduced the concept of an Intuitionistic Fuzzy Set (IFS) 
which is characterized by two functions expressing respectively the degree of belonging (degree of membership) and the 
degree of non-belonging (degree of non-membership). 

Mathematically speaking, an Intuitionistic Fuzzy Set can be defined as follows. 

Let X be an initial universe. An Intuitionistic Fuzzy Set (IFS) A c X is defined as follows: 


A= {(x, Ha (X), Va(X):X € X} 


where ua(x) represents the degree of membership of x to A, v q(x) indicates the degree of non-membership of x to A with 
the constraints: 


Ua (X) € [0, 1], va(x) € [0, 1]and 0 < jua(x) + va (x) < 1 


and then za(x)=1- (ua(x)+v a(x)) indicates the degree of hesitancy of x to A. 

As remarked by Grzegorzewski and Mròwka [116] “Atanassov’s sets give us a very natural tool for modeling preferences. 
Sometimes it seems to be more natural to describe imprecise and uncertain opinions not only by membership functions. It 
is due to the fact that in some situations it is easier to describe our negative feelings than positive ones. Even more, quite 
often one can easily specify objects or alternatives one dislikes, but simultaneously cannot specify clearly what he really 
wants.” For more details on Intuitionistic Fuzzy Sets see, e.g. Atanassov [21,22,25], Atanassov and Gargov [23] and Xu [314]. 
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Asharaf et al. [18] Lai et al. [158] 
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Kumar et al. [156] Li et al. [161] 
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Hamidzadeh et al. [123] 
Pacheco et al. [217]. 
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Malyszko and Stepaniuk [189] 
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In the literature, there has been a lively debate by some scholars on the appropriateness of the term intuitionistic fuzzy 
sets (IFS) adopted by Atanassov for his theory. Grzegorzewski and Mrowka [116] suggested a list of possible alternative 
terms that would however allow one to retain the acronym IFS: incomplete fuzzy sets, inaccurate fuzzy sets, imperfect 
fuzzy sets, indefinite fuzzy sets, indeterminate fuzzy sets, indistinct fuzzy sets. Dubois et al. [103] suggested inappropriate 
fuzzy sets, interval fuzzy sets, imprecise fuzzy sets. Atanassov [24] took part in the discussion replying to the observations 
of the scholars involved in the debate. 

The theory of Intuitionistic Fuzzy Sets has been used successfully in cluster analysis for modeling uncertainty in the 
clustering process and various clustering methods have been proposed. 

Hung et al. [129] proposed a fuzzy clustering method based on intuitionistic fuzzy tolerance relations. Pelekis et al. 
[234] proposed a variant of FcM that copes with uncertainty and involves a similarity measure between the intuitionistic 
fuzzy sets, which is appropriately integrated into the clustering procedure. Torra et al. [292] defined a clustering method to 
construct an intuitionistic fuzzy partition that deals with the uncertainty present in different executions of the same clus- 
tering procedure. Xu et al. [316] proposed a clustering algorithm based on the association coefficients of intuitionistic fuzzy 
sets. This clustering algorithm is similar to the clustering technique proposed by Hung et al. [129]. Cai et al. [44] suggest 
a clustering technique based on the use of the intuitionistic fuzzy dissimilarity matrix and the so-called (a, 6) -cutting 
matrices. Also this method is similar to that one proposed by Hung et al. [129]. Karthikeyani et al. [142] described an in- 
tuitionistic fuzzy approach to distributed fuzzy clustering. A simple clustering method for classifying data represented by 
intuitionistic fuzzy estimates has been suggested by Todorova and Vassilev [288]. Xu [313] developed an agglomerative hi- 
erarchical clustering algorithms for classifying ordinary intuitionistic fuzzy sets and interval-valued intuitionistic fuzzy sets. 
Xu and Wu [317] defined an intuitionistic fuzzy c -means method for intuitionistic fuzzy sets. Wang et al. [304] proposed a 
method for constructing an intuitionistic fuzzy tolerance matrix from a set of intuitionistic fuzzy sets and a netting method 
for clustering intuitionistic fuzzy sets via the corresponding intuitionistic fuzzy tolerance matrix. Zhao et al. [345] proposed 
an intuitionistic fuzzy minimum spanning tree clustering algorithm to deal with intuitionistic fuzzy information. Zhao et al. 
[346] proposed a measure for computing the association coefficient among intuitionistic fuzzy sets, an algorithm for clus- 
tering intuitionistic fuzzy sets and successively extended the algorithm to classify interval-valued intuitionistic fuzzy sets. 
Wang et al. [305] defined an intuitionistic fuzzy implication operator and extended the Lukasiewicz implication operator to 
intuitionistic fuzzy environments, and then defined an intuitionistic fuzzy triangle product and an intuitionistic fuzzy square 
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Table 3 
Contributes on intuitionistic fuzzy set-based clustering. 
Hung et al. [129] [126] 
Torra et al. [292] Thong and Son [284] 
Pelekis et al. [234] Verma and Agrawal [296] 
Xu et al. [316] Agrawal and Tripathy [3] 
Xu [313] Aliahmadipour and Eslami [5] 
Cai et al. [44] Ananthi et al. [12] 
Karthikeyani Visalakshi et al. [142] | Balasubramaniam and Ananthi [27] 
Todorova and Vassilev [288] Chen and Liu [57] 
Xu and Wu [317] Dubey et al. [101] 
Chaira [47] Kacprzyk et al. [137] 
Wang et al. [304] Kaushik et al. [143] 
Liu et al. [178] Prabu et al. [244] 
Rangasamy et al. [250] Shang et al. [270] 
Son et al. [280] Tripathy et al. [293] 
Xu [314] Tripathy et al. [294] 
Zhao et al. [345] Verma et al. [297] 
Xu et al. [312] Zhou et al. [352] 


Zhao et al. [346] 
Chaira and Panwar [48] 
Lin [165] 

Wang et al. [305] 





product. Moreover, they used the intuitionistic fuzzy square product to construct an intuitionistic fuzzy similarity matrix, 
based on which a direct method for intuitionistic fuzzy clustering has been proposed. 

Recently, intuitionistic fuzzy clustering methods have been proposed by Agrawal and Tripathy, [3], Aliahmadipour and 
Eslami [5], Ananthi et al. [12], Chen and Liu [57], Dubey et al. [101], Kacprzyk et al. [137], Prabu et al. [244], Shang et al. 
[270], Tripathy et al. [293,294] and Zhou et al. [351]. 

A chronological list of the main contributions to intuitionistic fuzzy sets-based clustering is shown in Table 3. 

From an Information-Uncertainty perspective, note that the various contributions to intuitionistic fuzzy clustering focus 
on the uncertainty associated with the assigning of objects to clusters, not measured in terms of point membership degree, 
but by means of two uncertainty measurements, i.e. the degree of belonging (degree of membership) and the degree of 
non-belonging (degree of non-membership). 


4.5.1. Vague clustering 

Since fuzzy sets theory is based on the concept of point-based membership, vagueness can be described in inaccurate 
ways. Motivated by this, Gau and Buehrer [109] elaborated the concept of vague sets. In vague sets, the membership function 
value is an interval-based membership, i.e., a subinterval of [0,1]. 

In a clustering domain, Xu et al. [312] proposed a fuzzy clustering method in the context of vague sets theory and fuzzy 
c-means clustering, i.e., a vague c-means clustering method (VcM). In the objective function of VcM, membership degrees 
are not point-based memberships but interval-based memberships based on the so-called truth-memberships and false- 
memberships. Note that, as pointed out by Bustince and Burillp [40], vague sets are equivalent to intuitionistic fuzzy sets; 
for this reason, the vague c-means clustering method suggested by Xu et al. [312] can be considered an intuitionistic fuzzy 
clustering. 


4.6. Evidential clustering or credal clustering or Belief clustering 


The theory of belief functions, also referred to as evidence theory or Dempster-Shafer theory (DST), is a general frame- 
work for reasoning with uncertainty, related to probability, possibility and imprecise probability theories. First introduced 
by Arthur P. Dempster [90] in the context of statistical inference, the theory was later developed by Glenn Shafer [268] into 
a general framework for modeling epistemic uncertainty—a mathematical theory of Evidence. The theory makes it possible 
to combine evidence from different sources and arrive at a degree of belief (represented by a mathematical object called 
belief function) that takes into account all the available evidence. Dempster-Shafer theory is a generalization of the Bayesian 
probability Bayesian theory of subjective probability. Belief functions base degrees of belief (or confidence, or trust) for one 
question on the probabilities for a related question. The degrees of belief itself may or may not have the mathematical 
properties of probabilities; how much they differ depends on how closely the two questions are related. Dempster-Shafer’s 
theory is based on two principles: obtaining degrees of belief for one question from subjective probabilities for a related 
question, and Dempster’s rule for combining such degrees of belief when they are based on independent items of evidence. 
In essence, the degree of belief in a proposition depends primarily upon the number of answers (to the related questions) 
containing the proposition, and the subjective probability of each answer. Also contributing are the rules of combination 
that reflect general assumptions about the data. In this formalism, a degree of belief (also referred to as a mass) is repre- 
sented as a belief function rather than a Probability distribution. Probability values are assigned to sets of possibilities rather 


P. D'Urso/Information Sciences 400-401 (2017) 30-62 45 


than single events: their appeal rests on the fact they naturally encode evidence in favor of propositions. Dempster-Shafer 
theory assigns its masses to all of the non-empty subsets of the propositions that compose a system—in Set Theory terms, 
the Power Sets of the propositions. 

Successively, Cozman [68-70] proposed the so-called Credal Set (CS) theory, which is a generalization of Bayesian theory, 
in which one acknowledges that there might be more than one reasonable function to represent belief and evidence. Thus, 
CS theory is strongly associated with theory of evidence [90,268]. 

For more details in Dempster-Shafer theory and Credal Set (CS) theory see, respectively, Dempster, [90], Shafer [268] and 
Cozman [68-70]. In the last decade, the credal theory and the Dempster-Shafer theory -due to their strong theoretical and 
methodological potential- have had a great success in Statistical Reasoning, especially thanks to the French school headed 
by Denceux and Masson who produced several relevant theoretical results above all in the field of cluster analysis. The 
clustering methods based on the credal theory and the Dempster-Shafer theory are often indicated in the literature by dif- 
ferent terms with the same meaning, such as Evidential clustering, Belief clustering, Dempster-Shafer clustering and Credal 
clustering. 

Credal clustering (or evidential clustering) represents the uncertainty about the membership of objects to clusters using 
the formalism of belief functions. In particular, in a credal partition, the membership of each object to clusters is described 
by a mass function, i.e., a function that assigns a mass between 0 and 1 to each set of clusters, with the constraint that the 
masses sum to 1. 

Paraphrasing the same authors, the first incursion of belief function into cluster analysis domain it was carried out by 
Denceux and Masson in 2004. In fact, Denceux and Masson [97] proposed a novel approach to clustering proximity data, 
based on the belief functions, called evidential clustering (EVCLUS). In their clustering method, the allocation of objects to 
classes is performed using the concept of basic belief assignment (bba), whereby a “mass of belief’ is assigned to each 
possible subset of classes. Having assigned a bba to each object, it is possible to compute, for each two objects, the plausi- 
bility that they belong to the same class. It is then required that these plausibilities be, in some sense, compatible with the 
observed pairwise dissimilarities between objects [97]. “As the concept of fuzzy partition subsumes that of crisp partition, 
resulting in greater expressive power of fuzzy clustering procedures as compared with hard ones, the concept of credal par- 
tition introduced [in EVCLUS] is even more general, which allows in some cases to gain deeper insight into the structure 
of the data. Additionally, evidential clustering provides the possibility to combine in a meaningful way credal partitions ob- 
tained from dissimilarity matrices provided, e.g., by several experts, or computed from different sets of measurements” [97]. 
In the same year, Masson and Denceux [97] suggested a new version of EVCLUS for interval-valued proximity data. Succes- 
sively, the same authors [190] proposed an evidential version of the fuzzy c-means clustering, i.e. evidential c-means (ECM) 
in which they address the problem of computing a credal partition from object data. Masson and Denceux [191] also de- 
fined a relational evidential c-means algorithm (RECM) for proximity data. In 2011, Masson and Denceux [192] investigated, 
in the belief functions framework (defined on the lattice of intervals partitions of a set of objects), the problem of finding 
a consensus clustering as an information fusion problem in the general framework of uncertain reasoning. Antoine et al. 
[14] suggested to introduce pairwise constraints in the ECM method in order to define the so-called constrained evidential 
c-means algorithm (CECM), which combines the advantages of adding background knowledge and using belief functions. 
Denceux and Masson [98] discussed the Evidential reasoning in large partially ordered sets with application to multi-label 
classification, ensemble clustering and preference aggregation. Liu et al. [177] proposed a new Belief C-Means (BCM) algo- 
rithm, in which the mass of belief of the specific cluster for each object is computed from the distance between the object 
and the center of the cluster, and the mass of belief of a meta-cluster is computed both from the distances between object 
and prototypes of the involved specific clusters, and the distances among these prototypes. In BCM, there is no need to 
compute the barycenter of the meta-clusters. Serir et al. [267] suggested an evidential evolving Gustafson-Kessel algorithm 
for online data streams partitioning using belief function theory. Antoine et al. [15] proposed an evidential clustering with 
instance-level constraints for relational data (CEVCLUS). Denceux et al. [96] introduced a new decision-directed evidential 
clustering algorithm (EK-NNclus) based on the EK-NN rule [92]. Zhou et al. [348] extended the median clustering methods 
in the framework of belief functions theory and put forward the Median Evidential C-Means (MECM) algorithm. By using 
a new concept of meta-clusters, Liu et al. [179] proposed an evidential version of FCM called credal c-means (CCM) to 
overcome the limitation of ECM. CCM method differs from BCM method because it is based on a distinct underlying princi- 
ple and a different interpretation of the meta-clusters. CCM also works with credal partition for the clustering of imprecise 
data based on belief functions. Guo and Sengtir [120] suggested a neutrosophic (see Section 4.9) evidential c-means (NECM). 
Denceux et al. [95] proposed some improvements to the EVCLUS algorithm, making it applicable to very large datasets. Zhou 
et al. [351] introduced the so-called ECMdd method, i.e. Evidential c-medoids clustering method with multiple prototypes. 
For other recent papers on evidential clustering, see Denoeux and Kanjanatarakul [93,94], Kanjanatarakul et al. [139], Li and 
Wang [164]. 

Recently, various algorithms introduced by the French school headed by Denceux and Masson -i.e. Evidential c- 
Means (ECM), Relational Evidential c-Means (RECM), Constrained Evidential c-Means (CECM), EVCLUS and EK-NNclus- have 
been implemented in the R software “evclust: Evidential Clustering’’ [91]. For more details, see the link: 
https://cran.r- project.org/web/packages/evclust/vignettes/Introduction.html 

The various contributes on credal clustering (evidential clustering) are shown chronologically in Table 4. 
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Table 4 
Contributes on credal clustering (evidential clustering). 
Denceux and Masson [97] Zhou et al. [350] 
Masson and Denceux [190] Liu et al. [179] 
Masson and Denceux [191] Guo and Sengiir [120] 
Masson and Denceux [192] Denceux [91] 
Antoine et al. [14] Denoeux and Kanjanatarakul [93] 
Denceux and Masson [98] Denceux et al. [95] 
Liu et al. [178] Kanjanatarakul et al. [139] 
Serir et al. [267] Li and Wang [164] 
Antoine et al. [15] Zhou et al. [351] 
Denceux et al. [96] Denoeux and Kanjanatarakul [94]. 





Note that the various clustering algorithms previously illustrated produce a credal partition, i.e., a set of Dempster-Shafer 
mass functions representing the membership of objects to clusters. The mass functions quantify the cluster-membership 
uncertainty of the objects. 


4.7. Credibilistic clustering 


Fuzzy set theory has been extensively developed and applied to a wide variety of real-life problems. Among these de- 
velopments are the possibility theory and the credibility theory, respectively based on the measure of possibility and of 
credibility. 

Although possibility measures are widely used, they are not self-dual. And, as remarked by Liu and Liu [173], self-dual 
measures are absolutely needed in both theory and practice. To address this issue, Liu and Liu proposed the use of credibility 
measures in 2002 [176], which avoid this issue, and are similar to probability measures. From then on, credibility measures 
have been widely applied to definitions, e.g., of expected value [176] or credibility distribution [173]. Li and Liu [163] also 
studied the relation between possibility measures and credibility measures, and demonstrated a sufficient and necessary 
condition for credibility measures. They showed that, compared to possibility measures, the advantage of credibility mea- 
sures is their self-dual property. Their paper shows a relation between possibility measures and credibility measures, and 
demonstrates a sufficient and necessary condition for credibility measures. 

Generally speaking, credibility theory is the branch of mathematics that studies the behavior of fuzzy events. A detailed 
survey of credibility theory may be found in Liu [175]. An axiomatic foundation for credibility theory was provided by Liu 
in 2004 [174]. 

Recently, credibilistic theory has had a positive impact in some areas of Statistical Reasoning, such as cluster analysis. 
Zhou et al. [347] proposed a clustering method based on FCM. They applied credibility weights to measure the compactness 
of data, but they did not apply them to minimize the objective function because of predefined membership functions. Their 
method has been used by Wen et al. [307] as a basis for an image segmentation algorithm. Zhou et al. [349] applied a 
hybrid method of spatial credibilistic clustering [347] and particle swarm optimization (SCCPSO). Wen et al. [308] suggested 
a modified SCCPSO. Niakan et al. [216] proposed a new credibilistic clustering method is introduced in which credibility 
measure is applied instead of possibility measure in possibilistic clustering. In Kalhori and Zarandi [138], the credibility 
theory is used with a different approach. “In particular, the credibility of an event, as the average of possibility and necessity 
of it, is pointed out. So, the possibility of the event, membership of a data in a cluster, and its necessity are calculated from 
objective function. Then, the credibility of this event is calculated using average of possibility and necessity. The possibility 
of membership of a data in a cluster is gained considering the distance of it from the other cluster centers; on the other 
hand, the possibility of non-membership of this data in the cluster is gained regarding the effect of membership of this 
data in the cluster on extension of its borders and decrement of its separation from the other clusters. The necessity of the 
event membership of the data in a cluster is complement of the possibility of event non-membership. After calculating the 
possibility and necessity of this event, its credibility is calculated as the average of its possibility and necessity” [138]. Note 
that, the proposed objective function of method suggested by Kalhori and Zarandi is interval type-2 (see Section 4.8) based 
on credibility measures; for this reason, the method is called Interval Type-2 Credibilistic Clustering (IT2CC). 

To solve the “coincident problem” affecting the possibilistic clustering, Zhou et al. [348] suggest a credibilistic cluster- 
ing method. In the proposed clustering algorithm, the objective function is the compactness index of the data sets with 
credibilistic membership weights, and the constraints on credibilities are deduced from the mathematical properties of the 
credibility measure. Furthermore, in order to solve this model with good clustering results, a credibilistic clustering algo- 
rithm based on the alternating cluster estimation method. For more, details, see Zhou et al. [348]. 

In line with the Informational-Uncertainty framework, uncertainty is measured in this case by means of a credibilistic 
measure. 


4.8. Type-2 fuzzy clustering 
Type-2 Fuzzy Set (T2FS) was introduced by Zadeh [353] as an extension of the concept of an ordinary Fuzzy Set (or Type- 


1 Fuzzy Set) and is essentially "fuzzy fuzzy" sets where the fuzzy degree of membership is a type-1 fuzzy set. Successively, 
it was investigated by Mizumoto and Tanaka [208,209]. 
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Fig. 7. Gaussian membership function for a Type-1 fuzzy set, for a Type-2 fuzzy set representing a type-1 fuzzy set with uncertain standard deviation and 
for a Type-2 fuzzy set representing a type-1 fuzzy set with uncertain mean (the mean is uncertain in the interval [0.4, 0.6]) (source: [46]). 


A Type-2 Fuzzy Set (T2FS) is characterized by a fuzzy membership function, i.e., the membership grade for each element 
of this set is a fuzzy set in [0,1], unlike a type-1 set where the membership grade is a crisp number in [0,1]. Such set can 
be used in situations where there is uncertainty about the membership grades themselves, e.g., an uncertainty in the shape 
of the membership function or in some of its parameters [46]. 

When we cannot determine the membership of an element in a set as O or 1, we use Type-1 Fuzzy Sets; when the 
situation is so fuzzy that we have trouble determining membership grade even as a crisp number in [0,1], we use Type-2 
Fuzzy Sets. This does not mean that we need to have extremely fuzzy situations to use Type-1 Fuzzy Sets. Type-1 Fuzzy 
Sets can be considered as a first order approximation to real-world uncertainty and Type-2 Fuzzy Sets as a second order 
approximation [46]. 

It is possible to consider fuzzy sets of higher types but the complexity of the fuzzy system increases very rapidly. For 
this reason, we will only focus on Type-2 Fuzzy Sets. 

In Fig. 7, we show an example of Gaussian membership function for a Type-1 fuzzy set (Fig. 7a), for a Type-2 fuzzy set 
representing a type-1 fuzzy set with uncertain standard deviation (Fig. 7b) and for a Type-2 fuzzy set representing a type-1 
fuzzy set with uncertain mean (the mean is uncertain in the interval [0.4, 0.6]) (Fig. 7c) drawn by Castillo and Melin [46]. 

In Fig. 8, we consider a Type-1 fuzzy set characterized by a Gaussian membership function (mean M and standard devi- 
ation ox), which gives one crisp membership m(x) for each input x € X, where 


m(x) = exp{—1/[ (x — M)/ox}’}. (2) 


Now, imagine that this membership of x is a fuzzy set. Let us call the domain elements of this set “primary memberships” 
of x (denoted by j4;) and membership grades of these primary memberships “secondary memberships” of x [denoted by 
[42(X, [41)]. So, for a fixed x, we get a Type-1 fuzzy set whose domain elements are primary memberships of x and whose 
corresponding membership grades are secondary memberships of x. If we assume that the secondary memberships follow 
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Fig. 8. A Type-2 fuzzy set in which the membership grade of every domain point is a Gaussian Type-1 fuzzy set (source: [46]). 


a Gaussian with mean m(x) and standard deviation om, as in Fig. 8, we can describe the secondary membership function 
for each x as 


[2 (X,P1) = exp{—1/2[ (u1 — M(X))/om]*} 


where u4 € [0,1] and m is as in Eq. (1) [46]. 
We can define the two more used kinds of Type-2 fuzzy sets as follows [46]: 


e a Gaussian Type-2 Fuzzy Set (GaT2FS) is one in which the membership grade of every domain point is a Gaussian type-1 
set contained in [0,1]; 

e an Interval Type-2 Fuzzy Set (IT2FS) is one in which the membership grade of every domain point is a crisp set whose 
domain is some interval contained in [0,1]. 


For remarking the general definition of T2FS with respect to particular cases, such as GaT2FS and IT2FS, in the literature 
it is often used for T2FS the acronym GT2FS (General Type-2 Fuzzy Set). 

We observe that uncertainty in the primary memberships of a type-2 fuzzy set, A, consists of a bounded region that we 
call the footprint of uncertainty (FOU). Mathematically, it is the union of all primary membership functions [195]. 

An upper membership function and a lower membership functions are two Type-1 membership functions that are bounds 
for the FOU of a Type-2 fuzzy set A. The upper membership function is associated with the upper bound of FOU(A). The 
lower membership function is associated with the lower bound of FOU(A). 

In Fig. 9 (drawn by [46]) we illustrate the concept of upper and lower membership functions as well as the footprint of 
uncertainty for a Type-2 Gaussian membership function. This footprint of uncertainty can be obtained by projecting in two 
dimensions the three-dimensional view of the Type-2 Gaussian membership function. 

Mathematically speaking, in general, a Type-2 Fuzzy Sets (T2FS) can be defined in the following way. 

Let X be an initial universe. A Type-2 Fuzzy Set (T2FS) A c X is defined as follows: 


A= {(x, y, Ea (x, u): x € X, u e [0, 1]}, 


where &a(x,u) represents the degree of membership function of (x, u) to A, xeX, ue[0,1]. 
Often, it is more convenient to write €,(x,u) as the product j1a(x,u) va(x,u), where 


Ha(x) = max ($4 (x, u)), 


éa(x u) 
va (x, u) = cy If Max) = 0 
1 otherwise 





are known, respectively, as the primary and the secondary membership functions [197]. 
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Fig. 9. Upper and lower membership functions and footprint of uncertainty for a Type-2 Gaussian membership function (source: [46]). 


We delineate the region &,(x,u) > 0, by means of the a low membership function w(x), and a high membership function 
a(x), where 


Eq(X,U) > Oif a(x) < u < Ga(X). 


Physically, the difference |4 (x) — W(x) | represents the uncertainty in specifying the primary membership value u(x) 
[197]. 

Thus, from the definition of T2FS, we can obtain the definition of IT2FS as follow: 

Let X be an initial universe. An Interval Type-2 Fuzzy Set (T2FS) A c X is defined as follows: 


A={(x y, £a (x, u): x € X, u e [0, 1]}, 


where é4(x,u)=1, xeX, ue[0,1]. 

Type-2 Fuzzy Sets have been widely and fruitfully used in a clustering context to manage different kinds of uncertainty 
affecting the clustering process. In particular, Type-2 Fuzzy Sets have been used in clustering for managing uncertainty 
related to the value of suitable parameters of the clustering algorithms. Ozkan and Turksen (2004) considered uncertainties 
of various parameters from imperfect information of patterns when applying fuzzy c-means. In particular, they focused on 
the uncertainty regarding the fuzziness parameter m according to entropies after removing uncertainties from the other 
parameters (e.g. number of clusters, cluster centers, and so on). Rhee and Hwang published several papers on uncertain 
fuzzy clustering based on IT2FS; i.e. in these papers they discussed the extension of several T1FS-based clustering methods 
(FcM) into IT2FS, namely the IT2 fuzzy perceptron (Rhee and Hwang, 2002 [130]), the IT2 fuzzy k-nearest neighbor algorithm 
[253], the IT2 fuzzy c-spherical shell algorithm [131], and IT2 FcM algorithm ({251,252,254]). In all papers published in 
period 2002-2004, they studied on how to define and manage uncertainty for distance measures when fuzzy membership 
functions are designed in clustering. In 2007, Hwang and Rhee [359] proposed the so-called Interval Type-2 Fuzzy FCM 
(IT2FCM) clustering method. In particular, they focused their attention on the representation and management of uncertainty 
which is present in the fuzzy memberships of the pattern associated to the varying of fuzzifier parameter m that controls 
the amount of fuzziness of the final fuzzy partition obtained by means of FCM. For designing and managing the uncertainty 
for m, they extended a pattern set to interval Type-2 fuzzy sets utilizing two fuzzifier mı and mz which creates a footprint of 
uncertainty (FOU) for the fuzzifier parameter m. Then, they incorporated this interval Type-2 fuzzy set into FcM to observe 
the effect of managing uncertainty from the two fuzzifiers. They computed the upper and lower membership [uj,, Uj, ] and 
successively the estimated centers by using the centroid type-reduction represented by the interval [v,,vp]. The Karnik- 
Mendel (KM) iterative algorithm was proposed to estimate the left v, and right vp for an interval cluster center when 
performing type-reduction during center-updating (see, Hwang and Rhee, [359]). 

Min et al. [198] suggested an IT2F Possibilistic c-means clustering (IT2FPcM) incorporating IT2Fss into the Possibilistic 
c-means clustering (PcM). They defined upper and lower memberships utilizing two different fuzzifiers m; and mọ (see 
[198]). 

Ji et al. [134] incorporated the IT2Fs into the hybrid fuzzy clustering scheme (fuzzy-possibilistic scheme), and thus pro- 
posed the interval Type-2 fuzzy possibilistic c-means (IT2FPcM) clustering algorithm. They used both fuzzy memberships 
and possibilistic typicalities to model the uncertainty implied in the data sets, and develop solutions to overcome the diffi- 
culties caused by Type-2 fuzzy sets, such as the construction of footprint of uncertainty, type-reduction and defuzzification. 
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Table 5 
Contributes on Type-2 fuzzy clustering. 





Rhee and Hwang [252] Sanchez et al. [264] 

Golsefid and Zarandi | 113] 
Rhee and Hwang [253] Golsefid et al. [114] 

Nguyen and Nahavandi [215] 


Hwang and Rhee [131] Pham et al. [243] 
Rhee [251] Rubio et al. [258] 
Hwang and Rhee [359] Rubio et al. [259] 
Min et al. [198] Wu and Liu [309] 
Ji et al. [134] Yao et al. [324] 

Linda and Manic [166] Comas et al. [60] 


Rubio and Castillo [256] Rubio and Castillo [257] 





In particular, they defined the lower and upper interval fuzzy membership and possibilistic typicality using, respectively, 
fuzzifiers (m;,mz) and (pj, p2). Then, they computed the interval of a primary fuzzy membership as [u;,, Uj] and the interval 
of a primary possibilistic typicality as [tj,, tig] (see, [134]). 

Note that all the previous clustering methods are based on a particular case of T2FSs, i.e. the IT2FSs. Although the original 
T2FSs may be useful in modeling uncertainty, the operations of T2FSs involve numerous embedded T2FSs, and thus require 
undesirably large amount of computations to consider all possible combinations of secondary membership values. Therefore, 
IT2FSs were proposed to reduce the computational complexity [195]. For this reason, i.e. because of the computational 
intensity of the processing of T2FSs, only their constrained version, i.e., the interval T2 (IT2) FSs, were typically used in 
cluster analysis. Fortunately, the recently introduced concepts of w-planes and zSlices allow for efficient representation and 
computation with T2FSs. Following this development, Linda and Manic [166] proposed a new approach for uncertain fuzzy 
clustering using the T2FSs in the fuzzy c-means clustering, called Type-2 Fuzzy C-means clustering (T2FCM) or General Type- 
2 Fuzzy C-means clustering (GT2FCM). The proposed method builds on top of the previously published IT2 FCM algorithm, 
which is extended via the œ-planes representation theorem [166]. 

In particular, Linda and Manic [166] proposed a novel method for managing the uncertainty associated with the selection 
of the fuzzifier parameter m for the FCM algorithm, a selection that has a direct impact on the location and quality of the 
cluster partition. 

The original T1FCM algorithm requires the specification of a precise fuzzifier value m. The IT2FCM algorithm accepts an 
interval-valued fuzzifier [m,, mp], which resembles a uniform uncertainty about the appropriate value of fuzzifier m. The 
T2FCM algorithm proposed by Linda and Manic [166] accepts a linguistic description of the fuzzifier value expressed as a 
T1 fuzzy set (e.g., “small” or “high” ). The resulting cluster membership functions are implemented as T2 FSs represented 
using the a-planes theorem. A novel hard-partitioning rule is proposed for the final input-cluster assignment. In addition, 
the quasi-T2 (QT2)FCM algorithm is also introduced as a simplified version of the GI2FCM method. See, for more detail, 
Linda and Manic [166]. 

Other recent interesting Type-2 fuzzy clustering methods have been suggested by Dang et al. [83], Rubio and Castillo 
{256], Sanchez et al. [264], Golsefid and Zarandi [113], Golsefid et al. [114], Nguyen and Nahavandi [215], Pham et al. [243], 
Rubio et al. [258,259], Wu and Liu [309], Yao et al. [324], Comas et al. [60], Rubio and Castillo [257]. 

A chronological list of the various contributes on Type-2 fuzzy clustering is shown in Table 5. 

Summing up, in line with the Information-Uncertainty perspective, clustering based on IT2Fs and T2FS manage the un- 
certainty associated with fuzzifier parameter m using an interval-valued fuzzifier [m,, mp], which resembles a uniform un- 
certainty about the appropriate value of fuzzifier m (for the IT2Fs case) and linguistic terms such as “small” or “high,” which 
are modeled as T1FSs (for the T2FS case). 


4.9. Neutrosophic clustering 


Neutrosophic Sets (NSs) theory was introduced by Smarandache [274] as a generalization of Fuzzy Sets (FSs) and Intu- 
itionistic Fuzzy Sets (IFSs) based on neutrosophy, which is a branch of philosophy. Neutrosophic sets (NSs) were proposed 
as a new branch of philosophy dealing with the origin, nature and scope of neutralities, and their interactions with different 
ideational spectra [275]. As illustrated in Guo and Sengiir [360], element <E> in neutrosophic set is considered in relation 
to its opposite, <Anti-E> and its neutrality <Neut-E>, which is neither <E> nor <Anti-E>, and three memberships are 
employed to measure the degree of truth, indeterminacy and falsity of <E>. Thus, NSs can not only evaluate the degree to 
true or false but also the degree to indeterminacy. Based on this character, neutrosophic theory provides a powerful tool to 
deal with indeterminacy, and has found practical applications in a variety of different fields (Guo and Sengiir, [360]). 

Formally, a Single-Valued Neutrosophic Set (SVNS) can be defined as follows. Let X be an initial universe. A Single-Valued 
Neutrosophic Set (SVNS) A c X is defined as follows: 


A= {(X, ta(x), ia (X), fa (x): x € X} 
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Table 6 

Contributes on neutrosophic set-based clustering. 
Shan et al. [269] Guo and Sengiir [119] 
Yu et al. [330] Guo and Sengiir [120] 
Anter et al. [13] Alsmadi [9] 
M.F. [13] Guo et al. [121] 
Guo and Sengiir [118] Huang [127] 
Ye [327] Karaaslan [141] 
Ye [328] Koundal et al. [149] 


Akhtar and Ahmad [4] Ye [329] 





where t(x) represents the truth-membership degree of x to A, ia(x) indicates the indeterminacy-membership degree of x 
to A and f(x) denotes the falsity-membership degree of x to A with the constraints: 


ta(x) € [0, 1], ia (X) € [0, 1], fa(x) € [0, 1]and O < ta(x) + ia (X) + fa (X) < 3, Yx € X. 


Although the theory of the Neutrosophic Sets was introduced eighteen years ago, it began to be used only recently in 
Statistics Reasoning and in cluster analysis, in particular. 

More specifically, the theory has been recently applied to clustering for taking into account suitably the uncertainty af- 
fecting the clustering process. Among the most relevant contributes we have the neutrosophic c-means clustering method 
proposed by Guo and Sengiir [119]. In particular, Guo and Sengiir [119] suggested a neutrosophic set-clustering method, 
in which the degrees belonging to the determinant and indeterminate clusters are computed at the same time for each 
of the data points. They considered in the clustering procedure a membership degree to determinant clusters and other 
two memberships to determine two kinds of indeterminate clusters: an ambiguity cluster and an outlier cluster for each 
data point, respectively. “Ambiguity cluster allows us to consider about the data points that are laying near the clusters 
boundaries and outlier cluster allows us to reject individual data points when they are very far from the centers of each 
cluster. Both ambiguity and outlier clusters are introduced in the clustering iterations and not in the decision processing. 
The membership degrees to the ambiguity and outlier class of a data point are explicit, and these values are learned in the 
iterative clustering problem. So, the membership functions are more immune to noise and they correspond more closely to 
the notion of compatibility” [119]. By considering an extension of the classical Dempster-Shafer (DS) theory (see Section 4.8) 
-the so-called Dezert-Smarandache theory (DSmT) of plausible and paradoxical reasoning- Guo and Sengiir [120] suggested 
a new clustering method called neutrosophic evidential c-means (NECM), based on NS and DSmT. In the NECM, the authors 
reformulate the objective function adopting a neutrosophic set to choose a suitable method for determining the mass func- 
tion in the DSmT evidence theory. NS computes the truth degree (T), falsity degree (F), and indeterminacy degree (I) for 
each of the data points. While T is used as the membership degree in clustering algorithms, | and F are considered to define 
an ambiguity cluster and an outlier cluster, respectively, for each data point. The ambiguity cluster allows us to consider 
the data points that are lying near the clusters boundaries, and outlier cluster allows us to reject individual data points 
when they are very far from the centers of each cluster. The final belonging decision for each data point is transformed into 
an information fusion problem solved by DSmT. A new mass function was defined using membership degree, ambiguity 
degree, and outlier degree. The DSmT combination rule and decision are applied to obtain the final clustering result [120]. 
Akhtar and Ahmad [4] suggested a modified fuzzy c-means clustering using meutrosophic logic. Karaaslan [141] defined a 
new structure called Single-Valued Neutrosophic Refined Soft Set (SVNRSS) e two correlation coefficients for SVNRSS. Thus, 
the author suggested a clustering method based on these correlation coefficients. 

The contributes on neutrosophic clustering suggested in the literature (most of the papers are very recent) are listed 
chronologically in Table 6. 

Thus, in line with the Information-Uncertainty perspective, by means of Neutrosophic Sets-based clustering, the uncer- 
tainty associated with clustering process is managed by means of three types of membership degrees, i.e. truth-membership 
degree, indeterminacy-membership degree and falsity-membership degree. 


4.10. Hesitant fuzzy clustering 


Torra and Narukawa [290] and Torra [289] suggested a new extension of fuzzy sets so-called Hesitant Fuzzy Sets (HFSs), 
to deal with the difficulty that often arises when the membership degree of an element must be established, a difficulty not 
due to the error margin (as in intuitionistic fuzzy sets, IFS) or to possibility distribution (as in Type-2 fuzzy sets, T2FS), but 
rather to the fact that there are some possible values that create a hesitation over which to choose [255]. Then, Hesitant 
Fuzzy Sets are an extension of Fuzzy Sets in which the membership degree of a given object is defined as e asset of possible 
values. In particular, a Hesitant Fuzzy Set can be defined as follows. 

Let X be an initial universe. A Hesitant Fuzzy Set(HFS) A c X is defined as follows: 


A= {(x,ha(x): x € X} 
where ha(x) is a set of different values in [0,1], representing possible membership degrees of x to A. 


Hesitant Fuzzy Sets theory can be usefully utilized for managing the uncertainty affecting different components of infor- 
mation in the clustering process. In fact, in group decision situations, the information provided by different decision makers 
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(experts) can vary significantly. The standard fuzzy clustering schemes are unable to incorporate the differences in the opin- 
ions of different decision makers, that is, they are unsuitable for clustering in hesitant fuzzy environments. HFSs can be used 
to solve the issue, because they avoid performing data aggregation and can directly reflect differences of opinion among de- 
cision makers [315]. In this connection, many distance and similarity measures useful for clustering based on for HFSs are 
listed in Xu [315]. In particular, Xu and Xia [318] defined different distance measures for hesitant fuzzy sets. Chen et al. 
[55] derived a number of correlation coefficient formulas for HFSs and applied them to clustering analysis under hesitant 
fuzzy environments. Zhang and Xu [312] extended the agglomerative hierarchical clustering algorithm for classifying hesi- 
tant fuzzy information. Chen et al. [54] derived a number of correlation coefficient formulas for HFSs and applied them to 
clustering analysis under hesitant fuzzy environments. Chen et al. [54] investigated the clustering technique for HFSs based 
on the k-means clustering algorithm which takes the results of hierarchical clustering as the initial input. Zhang and Xu 
[342] suggested a hesitant fuzzy minimal spanning tree (HFMST) clustering algorithm under hesitant fuzzy environment. 

Zhang and Xu [343] proposed a novel concept of hesitancy index of hesitant fuzzy set to measure the hesitancy degree 
among the possible values in each hesitant fuzzy element of the hesitant fuzzy set. By taking into account their hesitancy 
indices, they suggested new methods for measuring the distances between hesitant fuzzy sets and discuss their properties. 
Successively, according to the relationship between the distance measure and the similarity measure, they proposed two 
novel similarity measures for hesitant fuzzy sets and then define a novel hesitant fuzzy clustering algorithm on the basis of 
the novel similarity measures for classifying objects with hesitant fuzzy sets. Zhang and Xu [344] proposed a novel hesitant 
fuzzy agglomerative hierarchical clustering algorithm for HFSs. The algorithm considers each of the given HFSs as a unique 
cluster in the first stage, and then compares each pair of the HFSs by utilizing the weighted Hamming distance or the 
weighted Euclidean distance. Aliahmadipour et al. [6] defined a method to construct H-fuzzy partitions from a set of fuzzy 
clusters obtained from several executions of fuzzy clustering algorithms with various initializations of their parameters. 
Other hesitant fuzzy clustering methods have been suggested recently by Aliahmadipour and Eslami [5] and Torra et al. 
[291]. 

In line with the Information-Uncertainty perspective, the Hesitant Fuzzy Sets-based clustering methods take into account 
uncertainty by means of a set of different values, representing possible membership degree measures. 


4.11. Interval-based fuzzy clustering 


In the literature, there are several clustering methods based on a standard (non fuzzy) approach for managing imprecise 
data represented in the form of interval-valued data. See for instance de Souza and de Carvalho [88]; Billard and Diday [36]. 

Also as part of the fuzzy approach, many clustering methods for classifying imprecise data modeled as interval data have 
been suggested in the literature. See, e.g., de Carvalho and Tenorio (2004); D’Urso and Giordani (2006); D’Urso et al. [78,79]; 
D'Urso and Leski [82]. In all these methods the uncertainty is measured by point values of the membership degrees. 

Two interesting clustering methods -based on the interval-valued sets theory- have been proposed by Silva et al. [273]. In 
particular, they proposed an interval distance measure and -combining the idea of interval distance and interval arithmetic 
in the clustering process- two clustering methods in which the membership degrees are interval-based values; thus, in these 
methods the uncertainty is measured by means of intervals of membership degrees. 


4.12. Picture fuzzy clustering 


Recently, Cuong and Kreinovich [73] introduced the concept of Picture Fuzzy Sets (PFSs) as extensions of the Fuzzy Sets 
(FSs) and the Intuitionistic Fuzzy Sets (IFSs). Formally, PFS can be defined as follows. 
Let X be an initial universe. A Picture Fuzzy Set (PFS) A c X is defined as follows: 


A= { (X, Ua (X), na (X), Va (X): xX € X}, 


where ua(x) represents the degree of positive membership of x to A, a(x) indicates the degree of neutral membership of 
x to A and va(x) denotes degree of negative membership of x to A with the constraints: 


ua(x) € [0, 1], na (x) € [0, 1], va (x) € [0, 1] and O < ma (x) + na(x) + Va(X) < 1, VX € X 


and then 


Ta(X) = 1 — (Ma) + Na(X) + Va(X)) 


represents the degree of refusal membership of x to A. 

For more details on PFSs see Cuong and Kreinovich [73] and Cuong [72]. 

More recently, a number of clustering methods based on PFSs have been proposed. Son [277] proposed a distributed 
picture fuzzy clustering method on PFSs. Thong and So [284] proposed a hybrid method systems for medical diagnosis 
intermediate between picture fuzzy clustering and intuitionistic fuzzy recommender. Successively, the same authors have 
published other contributions to the picture fuzzy clustering method [278,279,281,285,286,286,287]. 

From an information-uncertainty viewpoint, the uncertainty in the picture fuzzy clustering is managed by means of the 
different types of membership degrees, i.e. degree of positive membership, degree of neutral membership and degree of 
negative membership. 
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Table 7 
Seminal papers on the different uncertain clustering approaches and connected papers on the uncertainty theories (ordered with respect to the first 
paper of each clustering approach). 








Year Uncertain clustering approaches Year Uncertainty theories 
Fuzzy clustering Fuzzy Sets (FSs) theory 
1966 Bellman, Kalaba, Zadeh [33] 1965 Zadeh [332] 


1969 Ruspini [260] 
1970 Ruspini [261] 
1973 Ruspini [262] 
1974 Bezdek [35] 
1974 Dunn [356] 
1981 Bezdek [355] 


Possibilistic clustering Possibilistic (Poss) theory 
1993 Krishnapuram and Keller [152] 1978 Zadeh [333] 
1996 Krishnapuram and Keller [152] 

Shadowed clustering Shadowed Sets (SSs) theory 
1998 Pedrycz [225] 1998 Pedrycz [225] 


2010 Mitra et al. [204] (shadowed clustering integrating fuzzy and rough clustering) 
2011 Zhou et al. [135] (shadowed set-based rough-fuzzy clustering) 
Rough set-based clustering Rough Sets (RSs) theory 
2004 Lingras and West [171] (rough clustering) 1982 Pawlak [221] 
2006 Peters [236] (refined rough clustering) 
2006 = Mitra, Banka, Pedrycz [206] (rough fuzzy clustering) 
2007 Maji and Pal [183] (rough fuzzy clustering) 
2007 Maji and Pal [184] (rough fuzzy possibilistic clustering) 
Intuitionistic fuzzy clustering Intuitionistic Fuzzy Sets (IFSs) theory 
2004 Hung et al. [130] 1986  Atanassov [25] 
2008 Pelekis et al. [234] 
2008 Torra et al. [292] 
2008 Xu et al. [316] 


Evidential clustering or credal clustering or Belief clustering Credal Sets (CSs) theory, Dempster- 
2004 Denceux, Masson [98] 1967 Shafer (DS) theory 
2008  Denceux, Masson [190] 1976 Dempster, [90], Shafer [268] 
1997 Cozman [68] 
Credibilistic clustering Credibilistic (Cred) theory 
2007 Zhou et al. [350] 2004 Liu [174] 


2014 Niakan et al. [216] 
2015 Zhou et al. [348] 
Type-2 fuzzy clustering Type-2 Fuzzy Sets (T2FSs) theory 
2007 Hwang and Rhee [359] (interval Type-2 fuzzy clustering) 1975 Zadeh [353] 
2009 Min et al. [198] (interval Type-2 possibilistic clustering) 
2012 Linda and Manic [166] (Type-2 fuzzy clustering or general Type-2 fuzzy clustering) 
2014 Jiet al. [134] (interval Type-2 fuzzy possibilistic clustering) 


Neutrosophic clustering Neutrosophic Sets (NSs) theory 
2012 Shan et al. [269] 1998 Smarandache [274] 
2015 Guo and Sengiir [119] 
Hesitant fuzzy clustering Hesitant Fuzzy Sets (HFSs) theory 
2013 Chen et al. [55] 2009 Torra and Narukawa [290] 
Interval-based fuzzy clustering Interval Sets (ISs) theory 
2015 Silva et al. [273] 1966 Moore [211] 
2015 Picture fuzzy clustering 2013 Picture Fuzzy Sets (PFSs) theory 
Son [277] Cuong and Kreinovich [73] 





5. A final summary overview 


As we have seen in the previous sections, after the first methods of fuzzy clustering several methods based on different 
approaches to manage uncertainty have been proposed. 

To offer an overview of the time interval in which these various uncertain clustering methods have been proposed, we 
show in Fig. 10 the timeline of the first more relevant papers on the various uncertainty clustering approaches and the first 
papers dealing with related uncertainty theories. 

In Table 7, we have ordered chronologically the seminal papers on the various uncertain clustering approaches and the 
related papers on uncertainty theories (the chronological order is with respect to the first paper on the different clustering 
approaches). 

In Fig. 10 and Table 7, we note that in some cases the time gap between the year of the first paper on the uncertainty 
theory and the year of the first paper on the connected uncertain clustering approach is short (e.g. the case of Fuzzy Sets 
with the fuzzy clustering) and sometimes very long (e.g., that is the case of Rough Sets, Intuitionistic Fuzzy Sets, Belief 
theory, Type-2 Fuzzy Sets and Neutrosophic Sets with the respective uncertainty theories). 
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Fig. 10. Timeline of the first more relevant papers on the different clustering approaches with management of uncertainty (Note that the labels of the papers on the clustering approaches and of the connected 
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Clustering approaches, theoretical formalisms and uncertainty measures. 





Uncertain theoretical formalisms 
Fuzzy Sets (FSs) 
Possibilistic theory 


Interval Sets (ISs) and Fuzzy Sets (FSs) 

Dempster- Shafer (DS) theory and 
Credal Sets (CSs) 

Type-2 Fuzzy Sets (IT2FSs) and Interval 
Type-2 Fuzzy Sets (T2FSs) 


Intuitionistic Fuzzy Sets (IFSs) 


Rough Sets (RSs) and Rough Fuzzy Sets 
(RFSs) 


Neutrosophic Sets (NSs) 


Shadowed Sets (SSs) 
Credibilistic theory 
Hesitant Fuzzy Sets (HFSs) 


Picture Fuzzy Sets (PFSs) 


Uncertain clustering approaches 

Fuzzy clustering 

Possibilistic clustering 

Hybrid clustering (Fuzzy-Possibilistic clustering) 

Interval-based fuzzy clustering 

Evidential clustering or credal clustering or Belief 
clustering 

Interval Type-2 fuzzy clustering 


Interval Type-2 possibilistic clustering 


Interval Type-2 fuzzy-possibilistic clustering 


Type-2 fuzzy clustering 


Intuitionistic fuzzy clustering 


Rough clustering 


Rough Fuzzy clustering and Rough Fuzzy 
Possibilistic clustering 


Neutrosophic clustering 
Shadowed clustering 
Credibilistic clustering 


Hesitant fuzzy clustering 


Picture fuzzy clustering 


Uncertain measures 

point membership degrees 

point possibility degrees (or compatibility degrees) 
point membership degrees and typicality degrees 
intervals of membership degrees 

Dempster-Shafer mass functions or degree of belief 


upper and lower fuzzifier parameters, interval 
fuzzifier parameters, upper and lower 
membership degrees, interval membership 
degrees, footprint of uncertainty 

upper and lower fuzzifier parameters, interval 
fuzzifier parameters, upper and lower typicality 
degrees, interval typicality degrees, footprint of 
uncertainty 

upper and lower fuzzifier parameters for 
membership and typicality degrees, interval 
fuzzifier parameters, upper and lower 
membership degrees, upper and lower typicality 
degrees, interval memebrship degrees interval 
typicality degrees, footprint of uncertainty 

linguistic fuzzifier parameter, w-plane of 
membership degrees 

belongingness degrees (membership degrees) and 
non-belongingness degrees (non-membership 
degrees) 

crisp lower and upper approximations 





fuzzy and/or crisp lower and upper 
approximations, fuzzy boundary, 
fuzzy-possibilistic boundary 

truth-membership degrees, 
indeterminacy-membership degrees, 
falsity-membership degrees 

exclusion, shadowed, and core zones 

credibilistic measures 

membership degrees defined as e asset of possible 
values 

positive membership degrees, neutral membership 
degrees, negative membership degrees 





All this highlights the different timing of the impacts of the various theories on uncertainty from the respective of 
methodological approaches adopted for clustering objects in uncertainty conditions; thus, the metabolic process of the the- 
oretical results used in the respective clustering methodologies has been different. 

In Table 8, we summarize for each uncertain clustering approach, the connected theoretical formalism from which they 
are inspired and the measures adopted for managing the uncertainty in the clustering process. 


6. Conclusions 


In Statistical reasoning, uncertainty affects the empirical and theoretical information of the knowledge process. 
For this reason, in the literature, various theoretical platforms have been proposed for formalizing, measuring and mod- 


eling the uncertainty. Focusing on the Fuzzy Sets theory [332] and its extensions and derivatives, in this paper the specific 
area of Cluster Analysis has been explored. 

In the last 50 years, after the first paper on fuzzy clustering [33], many clustering methods have been proposed. In fact, 
following different theoretical approaches for managing the uncertainty affecting the classification process, many extensions 
and derivatives of fuzzy clustering have been suggested. 

In this connection, using the various uncertainty theories, in each clustering approach, suitable uncertainty measures 
have been defined for managing uncertainty affecting the empirical and theoretical informational ingredients of clustering 
methodology. 

In this paper, we have presented an organic and systematic literature review of different uncertain clustering approaches, 
i.e. Fuzzy clustering, Possibilistic clustering, Shadowed clustering, Rough sets-based clustering, Intuitionistic fuzzy cluster- 
ing, Evidential clustering, Credibilistic clustering, Type-2 fuzzy clustering, Neutrosophic clustering, Hesitant fuzzy clustering, 
Interval-based fuzzy clustering, and Picture fuzzy clustering. 

In the future, it could be interesting to design - in a clustering framework and also in a more general theoretical- 
methodological context- a theoretical platform and to organize an integrated system in which the several theoretical ap- 
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proaches for managing the different types of uncertainty are synergically considered. In this connection, the ultimate goals 
are: to suitably model uncertainty through innovative and more efficient uncertainty measures, to reduce uncertainty and 
informational redundancy, to improve the informational gain and to optimize the power of the different theoretical for- 
malisms. 

These will be ones of the most exciting challenges in coming years. 

In this respect, Granular Computing could have a relevant role in establishing a stimulating synergy between individual 
theoretical approaches [224]. 

However, although the Statistical Reasoning System may be an effective and successful approach, a halo of uncertainty 
will always permeate information and therefore knowledge. The only certainty is that there is no certainty. 
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